Acronym Disambiguation: A Domain Independent Approach
نویسندگان
چکیده
Acronyms are present in usually all documents to express information that is repetitive and well known. But acronyms can be ambiguous because there can be many expansions of the same acronym. In this paper, we propose a general system for acronym expansion that can work on any acronym given some context information it is used in. We present methods for retrieving all the possible expansions of an acronym from Wikipedia and AcronymsFinder.com. We propose to use these expansions to collect the context in which these acronym expansions are used and then score them using a deep learning technique called Doc2Vec. All these things collectively lead to achieving an accuracy of 90.9% in selecting the correct expansion for given acronym on a dataset we scraped from Wikipedia with 707 distinct acronyms and 14,876 disambiguations.
منابع مشابه
Processus global d'acquisition et de gestion des sigles
This paper deals with an acronym/definition extraction approach from textual data (corpora) and the disambiguation of these definitions (or expansions). Both steps of our global process of acquisition and management of acronyms are precisely described. The first step consists in using markers such as brackets to identify expansion candidates. The alignment of the letters allows to select the ac...
متن کاملAutomated Identification of Synonyms in Biomedical Acronym Sense Inventories
Acronyms are increasingly prevalent in biomedical text, and the task of acronym disambiguation is fundamentally important for biomedical natural language processing systems. Several groups have generated sense inventories of acronym long form expansions from the biomedical literature. Long form sense inventories, however, may contain conceptually redundant expansions that negatively affect thei...
متن کاملManaging the Acronym/Expansion Identification Process for Text-Mining Applications
This paper deals with an acronym/definition extraction approach from textual data (corpora) and the disambiguation of these definitions (or expansions). Both steps of our global process of acquisition and management of acronyms are precisely described. The first step consists in using markers such as brackets to identify expansion candidates. The alignment of the letters allows to select the ac...
متن کاملSemi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts
Text normalization is an important aspect of successful information retrieval from medical documents such as clinical notes, radiology reports and discharge summaries. In the medical domain, a significant part of the general problem of text normalization is abbreviation and acronym disambiguation. Numerous abbreviations are used routinely throughout such texts and knowing their meaning is criti...
متن کاملExtraction and Disambiguation of Acronym Meaning-Pairs in Medline
Acronyms are widely used in biomedical and other technical texts. Understanding their meaning constitutes an important problem in the automatic extraction and mining of information from text. Moreover, an even harder problem is sense disambiguation of acronyms; that is, where a single acronym, termed a polynym, has a multiplicity of meanings, a common occurrence in the biomedical literature. In...
متن کامل